Goto

Collaborating Authors

 situation model


How LLMs Comprehend Temporal Meaning in Narratives: A Case Study in Cognitive Evaluation of LLMs

de Langis, Karin, Park, Jong Inn, Schramm, Andreas, Hu, Bin, Le, Khanh Chi, Mensink, Michael, Tong, Ahn Thu, Kang, Dongyeop

arXiv.org Artificial Intelligence

Large language models (LLMs) exhibit increasingly sophisticated linguistic capabilities, yet the extent to which these behaviors reflect human-like cognition versus advanced pattern recognition remains an open question. In this study, we investigate how LLMs process the temporal meaning of linguistic aspect in narratives that were previously used in human studies. Using an Expert-in-the-Loop probing pipeline, we conduct a series of targeted experiments to assess whether LLMs construct semantic representations and pragmatic inferences in a human-like manner. Our findings show that LLMs over-rely on prototypicality, produce inconsistent aspectual judgments, and struggle with causal reasoning derived from aspect, raising concerns about their ability to fully comprehend narratives. These results suggest that LLMs process aspect fundamentally differently from humans and lack robust narrative understanding. Beyond these empirical findings, we develop a standardized experimental framework for the reliable assessment of LLMs' cognitive and linguistic capabilities.


Requirements for Aligned, Dynamic Resolution of Conflicts in Operational Constraints

Jones, Steven J., Wray, Robert E., Laird, John E.

arXiv.org Artificial Intelligence

Deployed, autonomous AI systems must often evaluate multiple plausible courses of action (extended sequences of behavior) in novel or under-specified contexts. Despite extensive training, these systems will inevitably encounter scenarios where no available course of action fully satisfies all operational constraints (e.g., operating procedures, rules, laws, norms, and goals). To achieve goals in accordance with human expectations and values, agents must go beyond their trained policies and instead construct, evaluate, and justify candidate courses of action. These processes require contextual "knowledge" that may lie outside prior (policy) training. This paper characterizes requirements for agent decision making in these contexts. It also identifies the types of knowledge agents require to make decisions robust to agent goals and aligned with human expectations. Drawing on both analysis and empirical case studies, we examine how agents need to integrate normative, pragmatic, and situational understanding to select and then to pursue more aligned courses of action in complex, real-world environments.


Query-OPT: Optimizing Inference of Large Language Models via Multi-Query Instructions in Meeting Summarization

Laskar, Md Tahmid Rahman, Khasanova, Elena, Fu, Xue-Yong, Chen, Cheng, TN, Shashi Bhushan

arXiv.org Artificial Intelligence

This work focuses on the task of query-based meeting summarization in which the summary of a context (meeting transcript) is generated in response to a specific query. When using Large Language Models (LLMs) for this task, a new call to the LLM inference endpoint/API is required for each new query even if the context stays the same. However, repeated calls to the LLM inference endpoints would significantly increase the costs of using them in production, making LLMs impractical for many real-world use cases. To address this problem, in this paper, we investigate whether combining the queries for the same input context in a single prompt to minimize repeated calls can be successfully used in meeting summarization. In this regard, we conduct extensive experiments by comparing the performance of various popular LLMs: GPT-4, PaLM-2, LLaMA-2, Mistral, and FLAN-T5 in single-query and multi-query settings. We observe that while most LLMs tend to respond to the multi-query instructions, almost all of them (except GPT-4), even after fine-tuning, could not properly generate the response in the required output format. We conclude that while multi-query prompting could be useful to optimize the inference costs by reducing calls to the inference endpoints/APIs for the task of meeting summarization, this capability to reliably generate the response in the expected format is only limited to certain LLMs.


Causal interventions expose implicit situation models for commonsense language understanding

Yamakoshi, Takateru, McClelland, James L., Goldberg, Adele E., Hawkins, Robert D.

arXiv.org Artificial Intelligence

Accounts of human language processing have long appealed to implicit ``situation models'' that enrich comprehension with relevant but unstated world knowledge. Here, we apply causal intervention techniques to recent transformer models to analyze performance on the Winograd Schema Challenge (WSC), where a single context cue shifts interpretation of an ambiguous pronoun. We identify a relatively small circuit of attention heads that are responsible for propagating information from the context word that guides which of the candidate noun phrases the pronoun ultimately attends to. We then compare how this circuit behaves in a closely matched ``syntactic'' control where the situation model is not strictly necessary. These analyses suggest distinct pathways through which implicit situation models are constructed to guide pronoun resolution.


Dialogue Games for Benchmarking Language Understanding: Motivation, Taxonomy, Strategy

Schlangen, David

arXiv.org Artificial Intelligence

How does one measure "ability to understand language"? If it is a person's ability that is being measured, this is a question that almost never poses itself in an unqualified manner: Whatever formal test is applied, it takes place on the background of the person's language use in daily social practice, and what is measured is a specialised variety of language understanding (e.g., of a second language; or of written, technical language). Computer programs do not have this background. What does that mean for the applicability of formal tests of language understanding? I argue that such tests need to be complemented with tests of language use embedded in a practice, to arrive at a more comprehensive evaluation of "artificial language understanding". To do such tests systematically, I propose to use "Dialogue Games" -- constructed activities that provide a situational embedding for language use. I describe a taxonomy of Dialogue Game types, linked to a model of underlying capabilites that are tested, and thereby giving an argument for the \emph{construct validity} of the test. I close with showing how the internal structure of the taxonomy suggests an ordering from more specialised to more general situational language understanding, which potentially can provide some strategic guidance for development in this field.


What A Situated Language-Using Agent Must be Able to Do: A Top-Down Analysis

Schlangen, David

arXiv.org Artificial Intelligence

Even in our increasingly text-intensive times, the primary site of language use is situated, co-present interaction. It is primary ontogenetically and phylogenetically, and it is arguably also still primary in negotiating everyday social situations. Situated interaction is also the final frontier of Natural Language Processing, where, compared to the area of text processing, very little progress has been made in the past decade, and where a myriad of practical applications is waiting to be unlocked. While the usual approach in the field is to reach, bottom-up, for the ever next "adjacent possible", in this paper I attempt a top-down analysis of what the demands are that unrestricted situated interaction makes on the participating agent, and suggest ways in which this analysis can structure computational models and research on them. Specifically, I discuss representational demands (the building up and application of world model, language model, situation model, discourse model, and agent model) and what I call anchoring processes (incremental processing, incremental learning, conversational grounding, multimodal grounding) that bind the agent to the here, now, and us.


A Hierarchical Framework for Collaborative Artificial Intelligence

Crowley, James L., Coutaz, Joëlle L, Grosinger, Jasmin, Vázquez-Salceda, Javier, Angulo, Cecilio, Sanfeliu, Alberto, Iocchi, Luca, Cohn, Anthony G.

arXiv.org Artificial Intelligence

We propose a hierarchical framework for collaborative intelligent systems. This framework organizes research challenges based on the nature of the collaborative activity and the information that must be shared, with each level building on capabilities provided by lower levels. We review research paradigms at each level, with a description of classical engineering-based approaches and modern alternatives based on machine learning, illustrated with a running example using a hypothetical personal service robot. We discuss cross-cutting issues that occur at all levels, focusing on the problem of communicating and sharing comprehension, the role of explanation and the social nature of collaboration. We conclude with a summary of research challenges and a discussion of the potential for economic and societal impact provided by technologies that enhance human abilities and empower people and society through collaboration with Intelligent Systems.


Towards Situation Awareness and Attention Guidance in a Multiplayer Environment using Augmented Reality and Carcassonne

Kadish, David, Sarkheyli-Hägele, Arezoo, Font, Jose, Niehorster, Diederick C., Pederson, Thomas

arXiv.org Artificial Intelligence

Many senses, smell, touch, hearing, and sight, can potentially be augmented, though the most common application of AR is sight, using a head-mounted display [2]. Several users may simultaneously access and operate a shared digitally augmented environment, either at the same place or remotely. Users commonly interact with each other and the augmented elements in this virtual framework by using hand gestures, movement, and even gaze. The interactive nature of AR, as well as its direct connection to the real world, have produced extensive research work and industrial applications of AR to different fields such as education, entertainment, medicine, and retail [6]. Human-Computer interaction in games (HCI-games) is a very broad field that covers research on the many ways in which human players interact with digital games that, given their interactive, playful, and challenging nature, present a rich field of study separated from human-computer interaction in other forms of software [1].


SMART: A Situation Model for Algebra Story Problems via Attributed Grammar

Hong, Yining, Li, Qing, Gong, Ran, Ciao, Daniel, Huang, Siyuan, Zhu, Song-Chun

arXiv.org Artificial Intelligence

Solving algebra story problems remains a challenging task in artificial intelligence, which requires a detailed understanding of real-world situations and a strong mathematical reasoning capability. Previous neural solvers of math word problems directly translate problem texts into equations, lacking an explicit interpretation of the situations, and often fail to handle more sophisticated situations. To address such limits of neural solvers, we introduce the concept of a \emph{situation model}, which originates from psychology studies to represent the mental states of humans in problem-solving, and propose \emph{SMART}, which adopts attributed grammar as the representation of situation models for algebra story problems. Specifically, we first train an information extraction module to extract nodes, attributes, and relations from problem texts and then generate a parse graph based on a pre-defined attributed grammar. An iterative learning strategy is also proposed to improve the performance of SMART further. To rigorously study this task, we carefully curate a new dataset named \emph{ASP6.6k}. Experimental results on ASP6.6k show that the proposed model outperforms all previous neural solvers by a large margin while preserving much better interpretability. To test these models' generalization capability, we also design an out-of-distribution (OOD) evaluation, in which problems are more complex than those in the training set. Our model exceeds state-of-the-art models by 17\% in the OOD evaluation, demonstrating its superior generalization ability.


Toward a Narrative Comprehension Model of Cinematic Generation for 3D Virtual Environments

Cassell, Bradley Alan (North Carolina State University)

AAAI Conferences

Most systems for generating cinematic shot sequences for virtual environments focus on the low-level problems of camera placement. While this approach will create a sequence of camera shots which film individual events in a virtual environment, it does not account for the high-level effects shot sequences have on viewer inferences. There are systems which are based on well known cinematography principles such as the rule of thirds and other framing principals, however these usually utilize schemas or predefined shots and do not reason about the high level cognitive effects on the viewer. In this paper a system is proposed which can reason directly about these high-level cognitive and narrative effects of a shot sequence on the viewer’s mental state.